Analyze Amazon S3 Storage Costs with AWS Cost and Usage Reports, S3 Inventory, and Amazon Athena

Analyze Amazon S3 Storage Costs with AWS Cost and Usage Reports, S3 Inventory, and Amazon AthenaMore Info

Since its inception in 2006, Amazon Simple Storage Service (S3) has seen significant expansion, accommodating a wide range of use cases including website hosting, data lake creation, object storage for applications, log storage, and data archiving. As organizations develop their application portfolios, they often consolidate data from various applications and business functions into single S3 buckets, leading to massive storage sizes that can reach hundreds of TBs. While the AWS Billing console allows users to view the overall storage costs associated with Amazon S3, IT departments frequently require a deeper understanding of cost distributions across specific S3 buckets based on different prefixes or objects linked to particular users or applications. Analyzing S3 bucket costs can serve various purposes, including expense breakdown identification, internal chargebacks, and understanding cost allocations by business unit and application. Currently, there isn’t a straightforward method to dissect S3 bucket costs by objects and prefixes.

In this article, we present a solution utilizing Amazon Athena to query AWS Cost and Usage Reports and Amazon S3 Inventory reports to analyze costs by prefixes and objects within an S3 bucket.

Solution Overview

The architecture for this solution is depicted in the following figure. First, we enable AWS Cost and Usage Reports (AWS CUR) and Amazon S3 Inventory features, saving the outputs into two distinct pre-created S3 buckets. We then leverage Athena to query these S3 buckets for AWS CUR data and S3 object inventory data, enabling us to correlate and distribute the cost breakdown at the object or prefix level.

To implement this solution, follow these steps:

  1. Create S3 buckets for AWS CUR, S3 object inventory, and Athena results. While you can create these buckets when enabling their respective features, we will create them at the outset for this tutorial.
  2. Activate the Cost and Usage Reports.
  3. Configure the Amazon S3 Inventory.
  4. Set up AWS Glue Data Catalog tables for the CUR and S3 object inventory to enable querying through Athena.
  5. Execute queries in Athena.

Prerequisites

Before proceeding with this walkthrough, ensure you have the following prerequisites:

  • An AWS account.
  • AWS Identity and Access Management (IAM) permissions for:
    • Amazon S3 – to create and manage S3 buckets.
    • AWS Billing and Cost Management – to create Cost and Usage Reports.
    • Athena – to create tables and execute queries. Permissions for AWS Glue Data Catalog are also required to create tables.

Create S3 Buckets

Amazon S3 is a leading object storage service that offers exceptional scalability, data availability, security, and performance. It enables customers across various sectors to store and safeguard any amount of data for nearly any purpose, including data lakes and cloud-native applications. With cost-effective storage classes and user-friendly management features, businesses can optimize expenses, organize data, and configure specific access controls to meet unique requirements.

For this tutorial, we use the S3 bucket s3-object-cost-allocation as our primary bucket for cost allocation. This bucket is structured to include multiple prefixes and objects of varying sizes for which we need to allocate costs based on the total expenses associated with the bucket. In a practical scenario, you should select a bucket containing data from multiple teams, necessitating cost allocation by prefix or object. We will refer to this bucket as the primary object bucket.

Next, we’ll create three additional operational S3 buckets to store datasets necessary for calculating object costs. You can create these or use existing buckets as needed:

  • cur-cost-usage-reports- – This bucket will hold the Cost and Usage Reports for your account.
  • s3-inventory-configurations- – This bucket will store the inventory configurations for our primary object bucket.
  • athena-query-bucket- – This bucket will save the query results from Athena.

To create your S3 buckets, complete the following steps:

  1. Go to the Amazon S3 console and select Buckets in the navigation pane.
  2. Click on Create bucket.
  3. Enter a name for your bucket, such as cur-cost-usage-reports-.
  4. Choose your preferred AWS Region.
  5. Leave all other settings at their default values (or adjust according to your organization’s standards).
  6. Click on Create bucket.

Repeat these steps to create s3-inventory-configurations- and athena-query-bucket-.

Enable Cost and Usage Reports

The AWS Cost and Usage Reports (AWS CUR) provide the most detailed set of cost and usage data available. You can utilize these reports to publish your AWS billing data to an S3 bucket owned by you. They can break down your costs by hour, day, or month; by product or resource; or by custom tags you define.

To enable Cost and Usage Reports for your account, follow these steps:

  1. In the AWS Billing console, navigate to Cost & Usage Reports.
  2. Click on Create report.
  3. Name your report, for example, account-cur-s3.
  4. For Additional report details, select Include resource IDs to add individual resource IDs to the report. This option will create distinct line items for each resource, potentially increasing the size of your Cost and Usage Reports files and affecting S3 storage costs based on AWS usage. This feature is necessary for this tutorial.
  5. Choose your Data refresh settings regarding whether you want the Cost and Usage Reports to refresh if AWS applies refunds, credits, or support fees to your account after finalizing your bill.
  6. Click Next.
  7. For S3 bucket, select Configure.
  8. Choose an existing bucket from the previous section (cur-cost-usage-reports-) and click Next.
  9. Review the bucket policy, confirm its accuracy, and click Save. This default bucket policy allows Cost and Usage Reports to write data to Amazon S3.
  10. Enter cur-data/account-cur-daily for the Report path prefix.
  11. Choose Daily for Time granularity.
  12. Select Overwrite existing report for Report versioning.
  13. Select Amazon Athena for Enable report data integration for.
  14. Click Next.
  15. After reviewing your report settings, click Review and Complete.

The Cost and Usage reports will be delivered to the S3 buckets within 24 hours.

For a detailed overview of the Cost and Usage Report, you can find more information in another blog post here.

Enable Amazon S3 Inventory Configuration

Amazon S3 Inventory serves as one of the tools provided by Amazon S3 to assist with managing storage costs effectively. For those seeking authoritative insights on this topic, check out this resource. It’s an excellent guide for optimizing your S3 strategies.

For additional insights into avoiding common pitfalls with Amazon services, visit this link.

In summary, analyzing Amazon S3 storage costs is essential for effective budget management and operational efficiency. This guide outlines a clear approach to utilizing AWS tools like Cost and Usage Reports and Athena for detailed cost analysis.

Amazon IXD – VGT2
6401 E Howdy Wells Ave, Las Vegas, NV 89115


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *